Get Pages (Web Mining)
Synopsis
Gets pages from URLs in an attribute and stores them into a new attribute.Description
This operator retrieves pages, whose URLs are contained in the input data set. For each row in the data set, the URL is extracted from the specified attribute. A GET request is sent and a page is acquired. This page is stored in a new attribute specified by the parameter page attribute.
Input
- Example Set (Data table)
The Example Set port.
Output
- Example Set (Data table)
The Example Set port.
Parameters
- link attributeThe attribute that contains the URLs.
- page attributeThe name of the attribute that should contain the pages.
- random user agentChoose a user agent randomly from a set of 7000 user agents
- user agentThe user agent property.
- connection timeoutThe timeout (in ms) for the connection.
- read timeoutThe timeout (in ms) for reading from the URL.
- follow redirectsSpecifies, whether redirects should be followed.
- accept cookiesSpecifies, whether cookies should be accepted.
- cookie scopeSpecifies the scope of the cookies used
- request methodSpecifies the request method.
- delaySpecifies whether execution should not be delayed, delayed by a fixed or random amount of time.
- delay amountThe delay amount in ms.
- min delay amountThe minimum delay amount in ms.
- max delay amountThe maximum delay amount in ms.